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' The national Assessaen^ of Sduicationai Prcgfess 
(MAEP) and the Speech Ccaianicaticr. Asscciaiiion^ (S.Cl) initiated a 
pilot sj^dy to test the feasibility of assessing: speaking^ and 
llsteniii9 skills, k pool of 56 iteis vas developed and tteiu^eld 
tested at four sites vhich x'rptesented a variety of national riegioiis, 
of size and type of cities, and of racial, and^ ethnic pcpalations. 
There were significant differences fcetii€€n the respcns^s-cf- Minority^ 
and Bonsinority students, k ^pahel of speech coaiunicaticn experts 
hypothesized that ainority st^udents a^^ght have less cpexrialized 
Yoca'ibulary knowledge, a loiter tolerance for long aaterials perceived 
as boring, and less: experience listening to the ^ccehts and speaking 
rates of vhite speaker's. The results cf the XAEC/SCA project suggest 
a clear need f'or further .developaent and research in this area. In 
addition, three guidelines vere developed for researchers in the area 
of listeniiig ability: (1) focus on skills that are unigue and ceiitural 
to listening, . (2) :!se short, interesting listening sfiauli, and (3) 
consider extraneous factors which aight cpntyifcute-t^-it^a bias; 
these, suggestions are intended to aid re^archers rather than act as 
a definitive guide. (Jf) 
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ISSUES REUTE^ TO ASSESSING LJSTENING ABILITY^ . ' 
Introduction 

During the fifties and early sixties, there was a flurry of research 
and instructional development in the area~^fS;|steni.ng ability. Since 
that time there has been little activity. Devine*s (1978)- recent summary • . 

i 

of listening research primarily restates the conclusion^ which were reached 
by earlier reviewers (Caffrey, 1955; Toussaint, 1960; Russell, 1964; , Dixon, • 
19,64; Ducker and Patrie, 1964; Qev-ine, 1967; Keller*. 1569jJaTkej-,_l 971 ; ^ 
Weaver, 1972). However, reliance on eistablished answers to questions about 
listening appears to be misguided. • . - 

the inipetus for' this paper on the issues related to assessing listen- 
ing ability comes from rather disap])ointing results in a pilot effort to 
'develop listening measuresf for the National Assessment of Educational . ^ 
Progress. Our complacency concerning the task did not prepare'^us for th*6 

problems we encountered. We felt that listening ability, unlike the other f 

aspects of communication competence which we were trying to assess, w§s 
fairly well defined. Furthermore, we were encouraged by the fact that ^ 
there already existed qood models^ for assessing listening. However, the 
results of. our pilot' efforts proved to be less than satisfactory. In fact, 
they directed us td some major redevelopment and research in the area of 
listening.^ In this paper I will share the results of the gilot effort of , 
assessing listening ability and indicate the implications of this experi- 
ence for future measurement of listening ability. 
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Brief Revii^tf of Listening Research > Instruction and Assessment 

As Indicated in the fntroduction, the major contributions toward de- 

fining and studying listening ability Were made during 'the fift4es and , 

early sixties. Most of this activitpi n vol ved ^ . 

U defining listening ability, either as h unitary skill or ser:ies ^ 
' of su|)sktlis; ' ' ^ . ^ ' t 

1. exploring the relationship between listening ability and. dther 
factors, primarily verbal ability, reading ability and mot'lva- 
tion; ' : ^ ^ , . 

3* developing and evaluating listefifng instructional approaches; and 

4; developing tests of Hste^^ng ability. ^ • * * 

The resuTts of the research on listening have not produced a single, 

empirically based definitron of listening ability (Devlne,, f978). Instead, 

a series .o'f descriptions have emerged, mostly developed tiy those involved 

in listening instruction and measurement. These definitions are primarily 

based on a logical 'analysis of the listening process. Many .fallow estab- 

li'shed descriptions of reading compr^fiension. However ,*'they- clo not indicate 

a clear hierarchy or scope' and sequence of skills.. . . • ^ 

9 

The research related to the relationship of listening ahd other fac-*' 
tors has substantiated a relatively high positive correlation ^between lis- 
tening and general verbal ability. Crook (1957) *fo«jnd a corrfelation^ of ,.70 
betwfeen listening comprehension and Intelligence. Haberland (1959) fouivi 
generally high positive correlatlons^^^tween various listening and verbal 
ab.il ity measures. Likewise, the relationship between listening and reading 
has been well established. Irown (1965) reportei^rrelatipn^ ranging frohf 
.76 to .82 between liste;j1ng and reading for fourth, fifth. and sixth 
graders. Ducker (1965) reported an average correlation between listening 

^ 

and reading .of .57. However, it Is possible that the^ rejati-onship between 



listening/ anci reading may be explained by th^overlap of both these 
,ibilitie?; with gene^ral verbal ability. 

\ . Empirical research also has generally s^tpported. the relationship be- 
tween listening; ability and motivation or interest. This may be explained 
by the fact that listening is related to the general process of .learning 

- . ^ * i 

■ arid the relationship between motiviatibn .and learnirig has been yell estabr *. 
lislied (Barker* 1972). However, several experiments have.shown that the 
effects of interest did not greatly influence' listening conprehension 
(He<{'th, 1952; Karraker, 1964). A listening' tesfis a special ^ase of moti- 
^y^Mqn. Most students want to perform well on tests'. Kelly (1967) argupd 
that listening ability measured in a acknowledged test situation" is clif- 
ferent from listening measured under normal canditions. 

In addition to research studies, numerous progrims which teach listen- 
ing skills Kave been developed, including several coimercial packages (e.g,. 
Dun Donnelley, 1973; Educational Development Laboratories, 1969; McGraw-Hill, 
1969; Science Research Associates, n.d.). Reviews of many studies of Hstenin 
instrxiction state that in many but not all cases these progranfs have been , 

' effective in tmprovirt^ listening ability (Devine, 1967, 1978; Weaver, 1972). 
However, inspection o£.the cqntents of these programs, indicates that they 

'. differ greatly in terms of .the skills that they cover. ■ Furthermore," it 
appears that the effectiveness of these programs to some extent depends upon 
the match of the. instructional objectives and the evaluation instruments.. 

The variouis efforts at research and instructional were complemented 
by the development of seyera! listening ability tests.. Two standardized^ 
listening tests, the Brown-Carlsen (Harcourt Brace Jovanovich, 1955) and 
the Sequential Test of Educational Progress (Educational Testing Service, 



1957)» were developed In'th'e mid-fifties and have been widely used ever 
since. However, some evidence indicates that these tests of listening 
correlate as well with tests of verbal ability and reading iiblllty as they 
do with one another (Kelly, 1965). Perhaps this result can be explained 
by the fact that the!s^ two -tests do- not- cover the same set of llstehirfg 

subskllis.- ->&esides these two standardized listening tests, numerous others 

<- ^ < • .• - . 

have been reported in thesis research. However, most of these measures 

have not been carefully tested for reliability or validity. 

** ♦ ^ , ' 

Most efforts, related to listening have depended upon the general iza- 
tion which surfaced from the corpus of research described above. It was 

,basfed upon this evidence, that we/ embarked Ion the development of items tha^ 
measure listening ability for the National, Assessment of Educational 

^Progress. ^ 

NAEP/SCA Pilot Listening Assessment ' 
■' . ^ ' 

Jn June of 1976 the National Assessment af Educational Progress (NAEP) 

and the Speech Coramuni cation Association (SCA^) initiated a pilot study to 

test the feasibility of assessing* speaking and listening skills (Mead, 

1977a, 1977b). Jhe product^of this effort were intended for use in the 

National Assessment of ^Educational Progress, a national survey of student 

achievement with respect to important educational objectives, funded by 

the National Center for Education Statist! cs>-^ 

* There are some important differences between National Assessment and 

standardized achievement testing programs. The items developed by NAEP 

measure specific objectives which are considered important by educators 

and content specialists. They do not constitute a test per se . The items 

are used to describe the accomplishments of nationally representative. 



groups t)f students* They are not used to differentiate levels of ability 
aindng .individual students. .Nevertheless^ the task of- deWloping listen- ' 
ing assessment 'items \*as similar 'to standardized , test development in that 
it involved definijig the domain of listening ability, and' constructing , " 
Ite^ which measured that domain. 

Ths domain description developed by the NAEP/SCA pilot projfict re- 
fleeted a somewhat broader definition of listening tfian is typical.. The 
major focus of this description' was on ^the functions or purposes of, com- 
munication. These were identified as the informing function andjhe cOn- 
trolling, (or persuading) function. The /uncti'ons were further differen- 
tiated by the context or setting of the listening task. These iniluded 
fonnal and informal listening situations. Finally, the domain was dtfir^ed 

in tdtlhs of specific listening skills .or objectives. This included 

* ^ 

general, listening comprehension objectives as well, as specific listening • 
analysis objectives.- * . , ^ . • 

.Another way of describing the domajn is by characterizing the listen- 
ing stimuli and questions which weris developed for the pildf pnoje^.t. 
Stimuli representing the informing function included an informative speech, 
a telephone call, a newscast, and a- public service announcement. 'The 
stimuli representing the control ing function Included a persuasive speec.h, 
'a paid political announcement, a commercial, and a. *|ersuasive conversation 
between two' friends. Some of the questions which followed the stimuli 
"measured general listening comprehension, specifically simple recall, ^ 
terpretation/ and application. Other questions measured specific analysis 
objectives. These items" required identifying appropriate introductions 
and conclusions, organizational patterns, t^pes of support material, types 
of persuasive appeals, fact-opinion distinctions, and uses of e^dence. 



» ' A pool .of fifty-six' items was developed. The, items were packaged into 

« 

four test booklets, each representing approximately fifteen minutes of 
testing. The items were field tested in four sites which represented a 
variety of regions of the country, size and type of cities, and racial and 
ethnic populations^.' An average of 146 students responded to .each set of 
items. ^ ^ ^ . . - 

The items were analyzed using , typical item analysis statistics. Item 
difficulty was indicated by the percent of students choosing each option. 
Item discrimination was indicated by the jjiHnt biserial correlation be- 
tween individuals choosing, an option and their total test scores. In * " 
addition, the responses to each option were correlated* with an external 
criterion which reflected the classification of the students as minoritys 
or nonminority* This added information allowed reviewers to identify items 
which received significantly different responses by minority and nonmi- 
nority students. ' 

It is importanj: to emphasize the purpose of adding the external" 
criterion which reflected the racial/ethnic background of students to the 
'information base. The aim. of this strategy was not to eliminate all items" 
* which diffeVentiated between- minority and nonminority students. It is 
possible that there are real differences between these two groups with, 
respect to listening ability. The information was used to identify items* 
which mttfht be discriminating between minority jind nonijii nority students 
for reetso'ns othep than listening skill. For example,. an item might re- 
ceive differeht responses because of the varying backgrounds, experiences, 
values or language styles of minority^ and ..onminority students. We con- 
sidered thess factors to be extraneous to listening ability. 
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The r^sults^^af^ield testing indicated no problems with respect to 
•selecting^ terns with appropriate difficulty level arid ^discrimination 

4ir, Guidelines had been established for selecting items within the 
difficulty range of forty percent to* eighty percent correct respoVises 
with an average of sixty percent (Stanley and Hopkins, 1972) and with a 
discrimination level of ^ (Harris, 1968). Because the purpose of 
National A^es^ment is^not to build a test- but to select items which jnea-- 
sure specific objectives, these guidelines were merely suggestive and not 
crucial. Pfacttcally all of the items in the pool met the discrimination 
requirement and only about twenty, percent of the items fell outside the 
. proposed difficulty range. 

The s^urprising result from tryoQts W5S that the listening items, un- 
•like the items *for the other areas of communication competence which Were 
field tested at the same time, showed a high number of significant point 
biserial correlations between responses of minority and nonminority stur 
dents. Approximately one-half" of the^listening items demonstrated this 
characteristic. It must be emphasized that'a .significant correlation be- 
tween the responses of minority and nonminority students (a relationship 

f 

significantly different from zero) was not considered tantamount io item 
bias. There were a eouple of reasons for reviewing the data -cautiously. 

. .First, the^tryout sites included two all minority 'ichools. • This made it 
possible that "the distributions might include a concentration of minority 
students within a single .qption because of some unusual responses by the 

« students in -these schools. Secondly, a grea.t number of correlations were 
reviewed', one for each foil of each item. Among these, there were bound 
to b6 some relationships due to chance (one out of twenty). 
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A significant correlation was considered a signal for further review. 
In'some cases, the critique indicated possible sources of item biaS, such 
as a typically v^hite speech pattern in a listening stimuli^hich presented 
a liersuaslve conversation between two- friends, and -^he item was dropped. 
In^otheir cases,' however, the review could not detect any problems. and the 
Itefti was retained. As indicated earlier; a significant correlation was - 
not- Mnsidered. to be synonymous with item bias. However,* the frequency of 
this characteristic and the marked difference between tfils set of items 



ahd the other sets of items (informing speakinr, controlling speaking, 
ritualizing and sharing feelings) suggested a special problem, 

A panel of .speech communication experts reviewed the listening items 
and selected^.approximately one-half for use in the assessment. Abdut one- 
third-X)f the selected ttems reflectelf significantly different responses, 
by -minority^ and nonminority students. The consultants identified Very few 
specrfic aspects of the listening itpmS which they felt were indicative. of . 
item bias, such as the type o'f situation p^resented, 'the speech style used, 
or the values implied. However, they speculated a number of general 
charafit^ri sties of the items which mifht have tapped factors wh<^,were 

extraneous to measuring listening ability. These problems included: 

\ > 

1. ' the 'vocabulary level of the lis'tening stimuli ; 

2. the length af the foriryil speeches; 

3. the interest "level of tfie listening stimuli; 

4. the accent and rate of speech of. the speakers on the stimulus 
' tapes; and 

5. the level of disruption in^the classrooms. 

They hypothesized^ that minority students might have less specialized, 
vocabulary knowledge; a loweV^ tolerance for long, boring materials; and 
less.experl^ce. listening to the accents and rate of white speakers. 
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•Furthermore, minority students might tend tc be concentrated in schools 
where there were more disruptions in the classrooms and nearby environ- 
' ment* ^ • . \ ' ^ 

An additional factor which might explain the results is varying 

levels of verbal ability of the minority and nonminority* students in the 

\ '* * * ^ ^ 

tryout groups. If listening abilTty overlaps with verbal ability, as 

previous research indicates, it is possible that the results might be*ex- 

plained in terms of (Ji /fe rent' levels of verbal ability. The field testr 

ing did not collect information about the verbal ability of the studer?{s., 

Lt is possible that the irtinority students selected for*tryouts reflected 

'^an overeill fower ^level of verbal abifJity than the honminority students. 

The outcome of the tryoul; phase of the pilot listening ass'^ssment was 

the identification of a problem, potential item bias, and no real data to 

substantiate or further elaborate the 'situation. A numb^ of explanations 

. • ' • ' " ^ ' 

of "the results w?is proposed. However, these explanations were based on 
speculation and not on empirical evidence. The problem of minority bias 
had not been clearly articulated in past listening assessment efforts. 

The results of the NAEP/SCA ^p-flot project suggested a clear need for 

r 

further development and research. 

' Implications 'for Future Development of Listening Measures 
The message of this paper is that it is not as easy to assess lis- 
' tening ability 'as perhaps we have been led to believe* The problem of 

• differin^^^sponses between minority and nonminority students not only 

* flagged a potential^ problem in item bias, but aUo reopened more general 
issues regarding .listening ability. These include questions about the 
definition of the. domain of listening ability and about its relationship 

-9" 
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with othep factors (especially with/Vt bal ^.bility and. motivation.). , A 
successful measure of listening/ability must deal with all of these issues. 

Based on our experience -in the NAEP/SCA pilot project, we identified 
several recommendations for listening assessment which address the ques- 
tions listed above. These guidelines have beer adopted for our own con- 
tinuing -development of listening assessment- Hems and are also relevant 
for others interested in measuring this skill. 

Recommendation 1: Focus on Skills Which Are Unique and -Central to Listening 

As indicated in the second section of this paper, there still is no 
common definition of listening ability. In selecting from alternatives, 
it seems appropriate to, focus on skills which are unique and central to * 
listening. • • v 

The skills which-^are unique to listening involve responding to oral 
language, ^ooken language is different from written language in that it 
tends to bx. lonllnear, incomplete and redundant. It is ephemeral, it is 
accompanied by nonverbal communication, ?nd it often takes* place in an in- 
teractive situation; It therefore seems essential to utilise natural' 
spoken .language for listening stimuli. The roworkirvg of reading tests, in- 
to listening tests is inappropriate. It is less ob^/ious how to deal with 
the nonverbal and interactive nature of oral communication in an assess- 
ment situation. ' Nonverbal signals tend to be subtle and individualistic 
and thus difficult to include in an assessment. Likewise, the .give and 
take of normal speaking and, listening are difficult to recreate in a t.3st 
setting.' 

It is more difficult to identify the most central skills in the lis- , 
tening domain. In our present effort, we' have identified five core ' 

-10- J 

12 



objectives*' These reflect a compilatiorr-of the skills most often iden- 
tified in instructional and aSsesstnent materials. They include.the fol- 

lowing: ' ; f . " ' 

1* be able to recall ,significant^details; 

2. " be able to comprehend the main idea; 

3. be able to draw iinferences^ about the tnformation (e^g^r rela- 
tionships, implications}; . ^ 

4. 'be able to make-judspnents concerning the speaker {e.g., in- ^ 
•tent,/^ttftudes); and 

'•5.t be able to make judgments concerning the information (e.g., * 
types of evidence, logicof arguments). ' / - 

However, these objectives suffer from the same problems l|s earlier lists. 

They are based on logic," not empirical evidence. 

A final concern in defining the domain of listening-is the overlap 

between Ustening.ability and* verbal ability. Listeqing skill depends 

upon knowledge of vocabulary and the ability to manipulate verbal symbols 

However, it is also clear that it is possible to function effectively in 

many listening situations with a fairly limited vocabulary and with basic 

cognitive'skills. Kelly (1967, pp. 455-456} described this contradictory 

situation as follows:. ^^^-.-.^^^^^^ 

In testing- situations some of the "best" 1 jstener^Tna^t^ subjects 
jwit^i high mental ability who normally are relatively iTiaTtteQttve 
under non-test circumstances, and some of those who are "goo^ 
teners" urtder normal (non-test} conditions may do poorly in the 
tefst environment because they were handicapped by the inability to 
understand the difjQcuJt material frequently found in the tests of 
listening*. ^ • ^ 

Since the aim oF^ listening assessment is Tto -focus on skills which 
are central to this domain. It seems inappropriate to use materials and 
'items whidh'tap into high levels of verbal ability me/ely to gain dis- ■ 
crimination power in the measures. General verbal skills are less ame- 
nable^ improvement tnrouch listening 'instruction and they are already 
the focus of other types of assessment measures. 
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In our current research we are trying to explore the overlap between 

list'en^ng,and' verbaV ability. Listening stimuli-have been classified. 

* • • ' . f ' 

according to their vocabulary level by using readabiliijiy formulae. In ^ 

t 

■* 

•addition, listening items are being field tested along with verbal ability 
measures. This way it will be possible tq see which stimuli and which 
items are particularly related to verbal ability. These efforts will also , 
help sort out possible explanations to differing respons^es 'among minority^ 
and nonminority students in the initial tryouts. 

♦ 

Reconinendation 2: Use Short, Interesting Listening Stimuli ^ * 

* ~ - 

The purpose of the present assessment effort and most tests of listen- 
ing to date is to measure- listening ability under optimal conditions rather 
than typical listening^behavior in .actual- situations* As indicated in sec- 
tion two, motivation. pi ays a critical role in listening. In order to assess 
maxinmm ability, it therefore seems appropriate to make every effort to en- 
courage students to try their hardest on the items. For many students the 
test situation itself is an adequate motivator. However, more and more 
Students are reacting to the relevance of their school experiences, includ- 
ing testing. ^ 

One. problem identified in the initial tryouts was that stimuli were 
quite long (each speech was six minutes) and that the contents were unin- 
teresting to seventeen-year-olds. In the present development effort, every 

> • * ♦ 

attempt is being made to use relatively short, interesting listening 
stimuli. Stimuli range from one-half minute to three minutes in length. 
Materials focus on topics which are generally popular among teenagers. 

It is difficult to find listening materials which all students will 
find interesting. Some will enjoy sports, others will not. In some 

-12- 
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testing situations, this problem has been countered by using uni.fonnly 
boring materials. However; in the current situation it sejems mqre impor- 
tant to spark interest, even if it means introducing topics/which mig|it 
not motivate all studSnts to the. S(ame degree, In fact; by' focusing on^ 
topics and listening situations which' are conAon to ipost stjidepts (for 
exanple, school activities, friendships, television), it is possible to 
provide stimuli that are both interesting and universal . 

ReconiTiendation»3: Consider Extraneous Factors 'WMch Might Contribute to 
Item Bias 

- ^ - ,< t 

The primary result of tha.tryouts for the NAEP/SCA ptlot project was 

the identification of potential m-^nority bias in items. This finding high- 
lighted the need' for special attention to this problem. 

One technique for identifying item bias, which was used in the pilot 
project, was to review, correlational cfata regarding the. responses of 
mincrity and nonminority students. Although we have already -indicated 
some problems in using this information, it appeared a useful tool^ for 
finding extraneous factors such as background, experience and values. 

Another potential contributor to bias is*the quality of the listening 
Stimulus. It seems -likely that students wh0 are use to listening" to non- 
standard dialects or to languages -other th^ English might be ^Confused by 
the listening stimuli. One solution to this problem is asking the teacher 
to read the stimuli in the testing situation. This assumes that all stu- 
dents are used to listening to their^own teacher and will not be confused 
by his or her dialect or other speech characteristics. However, the lack 
of regularity in this type of testing situation and the possibility- of 
speech problems among some teachers (e.g., poor articulation) favor an 
alternative approach. 

* is 



■ 'In our present research we are developing stimulus tapes which use 

network. EiTglish as the mode for .presenting listening material. This 

Approach is based on the assumption that all stude'nts are'used tp watching 

and listening to television. This technique allows for, a high degree of ' 

^gu»|arity in the testing process. Additional considerations must be 

given to the listening environment i assuring that the loudness and tone 

of the stimuli are adequate and distractions are minimized. These 

, approaches whould minimize the extraneous factors which cause problems 

for minority students. . . ^ * . 

• * » 

^The recommendations discussed above are general and suggestive. They 
are presented primarily as an, impetus for further development and research. 
The guidelines are meant to encourage those who are involved in listening 
research, instructton or testing to explore the area mor 9 -definitively/ 
rather than to rely on existing data. The pecommendations emelrge more from 
the subjective experience of the NAEP/SCA pilot listening assessment pro- 
ject than from concrete empirical findings,j^iese-X(uldelijn«jnust be sub- 
jected to careful study and research. This it the goal of current National 
Assessment activities which are continuing to explore the area of listening, 
and hopefully, our efforts will be amplified by others interested in listen- 
ing. 
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